On Using Extended Statistical Queries to Avoid Membership Queries

نویسندگان

Nader H. Bshouty

Vitaly Feldman

چکیده

The Kushilevitz-Mansour (KM) algorithm is an algorithm that finds all the “large” Fourier coefficients of a Boolean function. It is the main tool for learning decision trees and DNF expressions in the PAC model with respect to the uniform distribution. The algorithm requires access to the membership query (MQ) oracle. The access is often unavailable in learning applications and thus the KM algorithm cannot be used. We significantly weaken this requirement by producing an analogue of the KM algorithm that uses extended statistical queries (SQ) (SQs in which the expectation is taken with respect to a distribution given by a learning algorithm). We restrict a set of distributions that a learning algorithm may use for its statistical queries to be a set of product distributions with each bit being 1 with probability ρ, 1/2 or 1−ρ for a constant 1/2 > ρ > 0 (we denote the resulting model by SQ–Dρ). Our analogue finds all the “large” Fourier coefficients of degree lower than c log n (we call it the Bounded Sieve (BS)). We use BS to learn decision trees and by adapting Freund’s boosting technique we give an algorithm that learns DNF in SQ–Dρ. An important property of the model is that its algorithms can be simulated by MQs with persistent noise. With some modifications BS can also be simulated by MQs with product attribute noise (i.e., for a query x oracle changes every bit of x with some constant probability and calculates the value of the target function at the resulting point) and classification noise. This implies learnability of decision trees and weak learnability of DNF with this non-trivial noise. In the second part of this paper we develop a characterization for learnability with these extended statistical queries. We show that our characterization when applied to SQ–Dρ is tight in terms of learning parity functions. We extend the result given by Blum et al. by proving that there is a class learnable in the PAC model with random classification noise and not learnable in SQ–Dρ. c ©2002 Nader Bshouty and Vitaly Feldman. Nader H. Bshouty and Vitaly Feldman

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهبود الگوریتم انتخاب دید در پایگاه داده‌‌ تحلیلی با استفاده از یافتن پرس‌ وجوهای پرتکرار

A data warehouse is a source for storing historical data to support decision making. Usually analytic queries take much time. To solve response time problem it should be materialized some views to answer all queries in minimum response time. There are many solutions for view selection problems. The most appropriate solution for view selection is materializing frequent queries. Previously posed ...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Improvement of the Analytical Queries Response Time in Real-Time Data Warehouse using Materialized Views Concatenation

A real-time data warehouse is a collection of recent and hierarchical data that is used for managers’ decision-making by creating online analytical queries. The volume of data collected from data sources and entered into the real-time data warehouse is constantly increasing. Moreover, as the volume of input data to the real time data warehouse increases, the interference between online loading ...

متن کامل

On Learning Branching Programs and Small Depth Circuits

This paper studies the learnability of branching programs and small depth circuits with modular and threshold gates in both the exact and PAC learning models with and without membership queries. Some of the results extend earlier works in GG95, ERR95, BTW95]. The main results are as follows. For branching programs we show the following. 1. Any monotone width two branching program (deened by Bor...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 2 شماره

صفحات -

تاریخ انتشار 2001

On Using Extended Statistical Queries to Avoid Membership Queries

نویسندگان

چکیده

منابع مشابه

بهبود الگوریتم انتخاب دید در پایگاه داده‌‌ تحلیلی با استفاده از یافتن پرس‌ وجوهای پرتکرار

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Improvement of the Analytical Queries Response Time in Real-Time Data Warehouse using Materialized Views Concatenation

On Learning Branching Programs and Small Depth Circuits

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

عنوان ژورنال:

اشتراک گذاری